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Q,' Abstract 

^ . In recent years the ultrahigh dimensional linear regression problem has attracted cnor- 

o ■ 

^^ ■ mous attentions from the research community. Under the sparsity assumption most of 

the published work is devoted to the selection and estimation of the significant predictor 
variables. This paper studies a different but fundamentally important aspect of this prob- 
lem: uncertainty quantification for parameter estimates and model choices. To be more 
specific, this paper proposes methods for deriving a probability density function on the set 
of all possible models, and also for constructing confidence intervals for the corresponding 
parameters. These proposed methods are developed using the generalized fiducial method- 

r — , ology, which is a variant of Fisher's controversial fiducial idea. Theoretical properties of 
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the proposed methods are studied, and in particular it is shown that statistical inference 



1^^ ^ ' based on the proposed methods will have exact asymptotic frequentist property. In terms 

^— N I of empirical performances, the proposed methods are tested by simulation experiments and 

an application to a real data set. Lastly this work can also be seen as an interesting and 
successful application of Fisher's fiducial idea to an important and contemporary problem. 
To the best of the authors' knowledge, this is the first time that the fiducial idea is being 



^ I applied to a so-called "large p small n" problem. 
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1 Introduction 



The ultrahigh dimensional linear regression problem has attracted enormous attentions in 
recent years. A typical description of the problem begins with the usual linear model 

rp 

Yi = Xi (3 + €i, or equivalently Y = X(3 + e, 



where Y = (Yi, . . . ,Yn) is a vector of n responses, X = {xi, . . . ,Xn) is a design matrix of 
size n X p with i.i.d. variables xi, . . . , Xn, /3 = (/3i, . . . , /3p)"^ is a vector of p parameters, and 
e = (ei,...,e,i) is a vector of n i.i.d. random errors with zero mean and unknown variance 
cj^. It is assumed that e and Xi, . . . ,Xn are independent, and that p is larger than n and 
grows at an exponential rate as n increases. It is this last assumption that makes the ultrahigh 
dimensional regression problem different from the classical multiple regression problem, for 
which p < n. 

When p ^> n, it is customary to assume that the number of significant predictors in the 
true model is small; i.e., the true model is sparse. The problem is then to identify which 
/3j's are non-zero, and to estimate their values. To solve this variable selection problem, one 
common strategy is to first apply a so-called screening procedure to remove a large number 
of insignifi c ant p redictors, and then apply a penalized method such as the LASSO method of 



Tibshirani 



(|l996l ) or the SCAD method of iFan and Lil ((20011) to the surviving predictors to 



select the final set of variables. For screening 



pendence screening p rocedure of 
have been proposed: 



2. 



proce dures, one of the earliest is the sure inde- 



Fan and Lvl (J2008l ). Since then various screening procedures 



WangI (|2009l ) developed a consi stent screening 



procedure that combines 



forward regression and the extended BIC criterion of 



Chen and Chen 



([2003) 



Biihlmann et al. 



2010l) proposed a s c reeni ng procedure that is based on conditional partial corrections, and 



Cho and Frvzlewiczj (J201lh constructed a screening procedure that utilizes information from 
both marginal correlation and tilted correlation. Also, other screening procedures are devel- 
oped for more complic ated settings , inclu di ng generalized linear m o dels and nonpararn etric 



additive modeling; e.g.. 



Meier et al. 



(120091) 



Ravikumar et al 



(120091), iFan and Lvl (|201lh and 



Fan and Lv ( 



Fan et al\ (1201111. For an overview of variable selection for high dimensional problems, see 



20101). 

While much efforts have been spent on model selection and parameter estimation for the 
ultrahigh dimensional regression problem, virtually no published work is devoted to quantify 



the uncertainty in t he chosen r aode. 
the pioneer work of 



Fan et al. 



Is and their parameter estimates. A notable exception is 
(|2012l ). where a cross-validation based method is proposed to 
estimate the error variance a^. Given such an estimate and a final model, confidence intervals 
for /3j's can be constructed using classical linear model theory. However, this approach does 
not account for the additional variability contributed by the need of selectin g a fina l mode l. 



1930) in 



The goal of this paper is to investigate the use of Fisher's fiducial idea (jFisheiJ . 
the ultrahigh dimensional regression problem. In particular a new procedure is developed for 
constructing confidence intervals for all the parameters (including a) in the final selected model. 
This procedure automatically accounts for the variability introduced by model selection. To 
the best of our knowledge, this is the first time that Fisher's fiducial idea is being applied to 



the so-cal 



e d "lar ge p small n" problem. 



Fished (|l930l ) introduced fiducial inference in order to define a statistically meaningful 



distribution on the parameter space in cases when one cannot use a Bayes theorem due to 



V defined, fidu cial i nference 



Hannid m09i ) and ISalomj (|l998l ) where a 



las a long and 



the lack of prior information. While never formal 
storied history. We refer an interested reader to 
wealth of references can be found. 

Ideas related to fiducial inference has experienced an exciting resurgence in the l ast decade . 
Some of these mod[ e rn ideas are Dempster-Shafer calculus and it s gene ralizations (JDempsteii . 



20081 : 



Martin et al. 



tions (jSingh et al 



20ld: 



200,4 



Zhang and Liu . 



Xie et al. 



2011 



2011 



Martin and Liii 



genera. 



reference priors in objective Bayesian inference (JBerger et al 



ized inferenc e (IWeerahandi 



20131). confidence dist ribu 



1993 



19951) and 



20091 ). There has also been a 



wealth of successful ap p 



pies see 



McNallv 



Hannig and Lee 



et al. 



3 



ications of these methods to practical problems. For selected exam 



(12Q03|li 



20091) and 



Wang and Ived ( 



2005) 



Cisewski and Hannia (|2012l ) 



E et al. 



(120081 ): 



Edlefsen et al. 



(120091); 



The particular variant of Fisher's fiducial idea that this paper considers is the so- called 
generalized fiducial inference. Some early ideas were developed by 



Hannig et al. 



(120061 ) ■ and 



later iHannia (|2009l ) used these ideas to formally define a generalized fiducial distribution. An 
brief description of generalized fiducial inference is given below. 

The rest of this paper is organized as follows. Section [2] provides some background mate- 
rial on generalized fiducial inference, and applies the methodology to the ultrahigh dimensional 
regression problem. The theoretical properties of the proposed solution are examined in Sec- 
tion [3l while its empirical properties are illustrated in Section [H Lastly, concluding remarks 
are offered in Section [5] and technical details are delayed to the appendix. 

2 Methodology 

Generalized fiducial inference begins with expressing the relationship between the data Y and 
the parameters as 

Y = G{U,e), (1) 

where G{-,-) is sometimes known as the structural equation, and U is the random component 
of the relation whose distribution is completely known; e.g., a vector of i.i.d. U(0,l)'s. Recall 
that in the definition of the celebrated maximum likelihood estiimator, Fisher "switched" the 
roles of Y and 0: the random Y is treated as deterministic in the likelihood function, while 
the deterministic is treated as random. Through ([1]) generalized fiducial inference uses this 
"switching principle" to define a valid probability distribution on 0. 

This switching principle proceeds as follows. For the moment suppose for any given real- 
ization y of Y, the inverse 

e = G\y,u) (2) 

always exists for any realization u of U. Since the distribution of U is assumed known, one can 
always generate a random sample Ui,U2, ■ ■ ■, and via ([2]) a random sample of can be obtained 
by 01 = G {y, iii), 62 = G (y, 112), ■ ■ ■■ This is called a fiducial sample of 6, which can be 
used to calculate estimates and construct confidence intervals for in a similar fashion as with 
a Bayesian posterior sample. Through the above switching and the inverse operations, one 
can see that a density function r{0) for is implicitly defined. We term r{0) the generalized 
fiducial density for 0, and the corresponding distribution the generalized fiducial distribution 
for 0. An illustrative example of applying this idea to simple linear regression can be found in 



Hannig and Led (|2009l ), and a formal mathematical definition of generalized fiducial inference 



is described in detail in iHannig) (120091 ) . The latter work also provides strategies to ensure the 
existence of the inverse ([2]) . 

Observe that for the ultrahigh dimensional regression problem that this paper considers, 
can be decomposed into three components: = {M, cr, /3jy,j}, where M denotes a candidate 
model and can be seen as a sequence of p binary variables indicating which predictors are 
significant, a is the noise standard deviation and /3^/ is the coefficients of the significant 
predictors. In the next subsection we derive the generalized fiducial density r{M) for M, and 
then we will demonstrate how to generate a fiducial sample {M,cr,/3} using r{M). 



2.1 Generalized Fiducial Density for Ultrahigh Dimensional Regression 

While the above formal definition of generalized fiducial inference is conceptually simple and 
very general, it may not be e asily applicable in some practical situations. When the model 



dimension is known, iHannid (|2013l ) derived a workable formula for r{6) for many practical 
situations. Assume that the parameter G C M is d-dimensional and that the inverse 
G {y, 6) = uto ^ exists. This assumption is satisfied for many natural structural equations, 
provided that y and u have the same dimension and G is smooth. Note that this i nverse is 



-1 . 



differe nt from the inverse G in ([2]). Then under some differentiability assumptions, 



Hannig 



20131 ) showed that the generalized fiducial distribution is absolutely continuous with density 

fiy,e)J{y,e) 



r{9) 



J@f{y,e')Jiy,o')de' 



where 



J(y,s) 



E 



i={ji,...,jd) 

l<ii<---<i^<n 



det 



\^G-Hy,e)y'-^G-\y,e 

[dy J d(e,yk) 



(3) 



(4) 



In the above f{y,0) is the likelihood and the sum goes over all p-tuples of indexes i = (1 < 
ii < • • • < id < n) C {1, . . . , n}. Also, for each i we denoted the list of unused indexes by i = 
{1, . . . ,n}\i, the collection of variables indexed by ihy yj^ = (yi-^^, . . . ,yj^), and its complement 
by y^ = iUi '■ i ^ i )■ The formula -j-n — jr-G^^{y, 6) stood for the Jacobian matrix computed 

with respect to all parameters 6 and the observations y .z- Similarly -j—G {y,0) stood for 



Hy 



the Jacobian matrix computed with respect to the observations y. 



Recall that the formula ([3]) was derived for situations where the model dimension is known, 
and he nce it cannot be direct ly applied to the current problem. When model selection is re- 
quired, |Haniiig_andJjeg (J2009l ) proposed adding extra penalty structural equations to ([3]). This 



is similar to adding a penalty term to the likelihood function to account for model complexity. 
In particular their deriyation shows that the fiducial probability of each candidate model M is 
proportional to 



r{M) cc [ fM{y,e)jM{y,e)dee-'i^^\ 



(5) 



where fM{y,d) is the likelihood, Juiv,^) is the Jacobian ([3]), and q{M) is the penalty as- 
sociated with the model M. In the context of w ayelet regressio i i, the y recommended using 



the minimum description length (MDL) principle (JRissanen . 



1981 



20071 ) to derive the penalty 



q{M), which is shown t o possess attracti v e the oretical and empirical properties 



Given the success of 



Hannig and Led ()2009l ) . we also attempted to use the MDL principle 



to derive a penalty q{M) for the current problem, which gives q{M) = 0.5|M| logn with \M\ 
being the number of significant parameters in M. However, this form of q{M) fails here, as 
the classical MDL principle was not designed to handle the "p ^ n" scenario. To overcome 
this issue, we propose using the following penalty 



K^) = ^logn + log,i/, (^1^1 



(6) 



where the additional second term comes from the need to encode which of the parameters are 
left as zero. Here 7 is a constant measuring the quality of the encoding; the most natural 
choice is 7 = 1 but other choices are possible. In all our numeric al work we use 7 = 1. We 
note that the second term of ([6]) is similar to the EBIC penalty of I Chen and ChenI (|2008l ). 



Denote the residual sum of squares of M as RSSm when the corresponding /3 is estimated 
with maximum likelihood. Using penalty ([6]), for the current ultrahigh dimensional regression 
problem, it is shown in Appendix |A] that the fiducial probability for model Af is 



r{M) oc r ( ILAMI ) (vrRSSAf ) 



n-|A/|-l __|M| + 1 

2 n 2 



P 

\M\ 



(7) 



2.2 Practical Generation of Fiducial Sample 



In this subsection we propose a practical procedure for generating a fiducial sample {M, a,/?} 
using d?]). First note that even for a moderate p, the total number of models 2^ is huge and 



6 



hence any method that is exhaustive in nature is computationahy not feasible. 

The proposed procedure begins with constructing a class of candidate models, denoted as 
M' . This Ai' should satisfy the following two properties: \M.'\ is small and it contains the true 
model and models that have non- negligible values of r{M). To constr uct 7W, we first apply 
the sure independence screening (SIS) procedure of lFan and Lvl (|2008l ) to reduce the number 
of predictors from p to p', where p' is of order 0{n). To further reduce the number of possible 
models (which is 2^ ), we apply LASSO to those p' predictors that survived SIS, and take all 
those models that lie on the LASSO solution path as M' . Note t hat the LASSO so lution path 



can be quickly obtained via the least angle regression method (JEfron et al. 



20041 ). and that 



constructing Ai' in this way will ensure the true model is captured in M' with probability 1 



(Fan and Lv 



20081 ) 



Once Ai' is obtained, for each M G Ai\ calculate 

\M 



R{M) 



n 



(^RSS 



n-|M|-l 



\M\+1 



M 



n 



p 

M\ 



-7 



and approximate the generalized fiducial probability ([7]) by 



^(M) Ri R{M)/ ^ R{M'), for M G M' . 



(8) 



M'eM' 
Next for a and [3j^j. For any given M, it is straightforward to show that the generalized 
fiducial distribution of a conditional on M is 



RSSaz/ct^ ~ x^{n - \M\) 



(9) 



and that of j3^,j conditional on M and a is 



f^M ~ ^{f^M )Cr X^Xm) 



(10) 



where (3^,j is the maximum likelihood estimate of /3jy^ for model M, and Xm is the design 
matrix for model M. 

Thus to generate {M, a, /3}, we first draw a model M G M' from ([8]), then a from ([9]) given 
M, and lastly ^3 from ^) given {M,d}. 



2.3 Point Estimates and Confidence Intervals 

Applying the above procedure repeatedly one can obtain multiple copies of {M, a, (3} that 
form a fiducial sample for {M,a, fSj^j}. This fiducial sample can be used to form estimates 
and confidence intervals for cr in a similar manner as with a Bayesian posterior sample. For 
example, the average of all ex's can be used as an estmate of a, while the 2.5% smallest and 2.5% 
largest a values can be used respectively as the lower and upper limits for a 95% confidence 
interval for a. 

Obtaining estimates and confidence intervals for (3 is, however, less straightforward. It is 
because for any f3j, it is possible that it is included in some but not all M's. In other words, 
some of the generated fiducial values for Pj are zeros, some are not. 

We use the following simple procedure to deal with this issue. For each /3j, we count the 
percentage of zero fiducial sample values. If it is more than 50%, we declare that this particular 
(3j is not significant. Otherwise, we treat /3j as a significant parameter, and use all the non-zero 
fiducial sample values to obtain estimates and c onfidence intervals fo r it, in the same way as 
for a. Note that a similar idea has been used by 



Barbieri and Bergen (120041 ) to determine the 



significance of a parameter in the Bayesian context. 



3 Theoretical Properties 

This section investigates the theoretical properties of the above-proposed generalized fiducial 



based method, under the situation that p is diverging and the size of true mode^ 
or di verging. For similar results in the classical situations where p is fixed, see 



is either fixed 



201^1 ^ 



Hannigl tea, 



First, some notations. Let M be any model, Mq be the true model, and Hm be the 
projection matrix of Xm', i-e., Hm = Xm{Xj^_jXm)~^Xj^j. Define 

^M = WfJ- — HMfJ-W , 

where fx = E{Y) = XmqPmq- Throughout this section we assume the following identifiability 
condition holds: 

lim min i r^ — : Mq (t M, \M\ < k\Mo\ I = oo (11) 

n-5.00 [|Mo|logp J 



for some fixed k > 1. This condition ensures that the true model is identifiable and has been 
used for example by 



Luo and Chen 



()2013l ). It can be shown that, under the sparse Reisz 



condition and the condition 



mm{\l3j\;j G Mq} ^ oo, 



\Mo\logp 
the identifiability condition (jlip holds. However, the inverse does not hold in general. 

Let M be the collection of models such that A4 = {M : \M\ < /c|Mo|} for some fixed k. 
The restriction |M| < /c|Mo| is imposed because in practice we only consider models with size 
comparable with the true model. 

If p is large, the size of Ai could still be too large in practice. In this situation, we could 
use a variable screening procedure to reduce the size. This variable screening procedure should 
result in a class of candidate models Ai' which satisfies 

P{Mo£M')^l and log(|7W^|) = o(j log n), (12) 

where A^'- contains all models in Ai' that are of size j. The first condition in (J12p guarantees 
the model class contains the true model, at least asymptotically. The second condition in (J12p 
ensures that the size of the model class is not too large. These two conditions are satisfied by 
the practical algorithm presented in Section 12.21 

In Appendix IBJ the following theorem is established. 

Theorem 3.1. Under <f77]). as n —)• oo, p —)• oo, |Afo|log(p) = o{n), log(|Afo|)/log(p) -^ 6 
and log(n)/log(p) — )■ t], then there exists 7 > j^ — 2(i-S) -^^c/i that 

max r(M)/r(Mo) 4- 0. (13) 

My^Mo,M&M 

Furthermore, if Iil2\) holds, with the same 7, 

r{M) 4 1 (14) 



over the class Ai' . 

Equation (|13p states that the true model has the highest generalized fiducial probability 
amongst all the models in A^. However, it does not imply equation (J14p in general because the 
class of candidate models can be very large. If we constrain the class of models being considered 

9 



in such a way that (J12p holds, then equation (J14p states that, with probabihty tending to 1, 
the true model will be selected. From Theorem 13. 11 one can conclude the following important 
corollary. 

Corollary 3.1. Statistical inference that is based on the generalized fiducial density ^ will 
have exact asymptotic frequentist property. Consequently the generalized fiducial distribution 
and derived point estimators are consistent. 



4 Finite Sample Properties 

4.1 Simulations 

A simulation study was conducted to evaluate the 



cedure. The following model from 



Fan et al. 



practical performance of the proposed pro- 



(|2012l ) was used to generate the noisy data 



Y = b{Xi + . . . + Xd) + e, 

where e is i.i.d. standard normal error, d is the number of significant predictors, and the 
coefficient b controls the signal-to-noise ratio. All the covariates are standard normal variables 
with correlation coi{Xi,Xj) = pl*~-'L Three combinations of (n,p, d) were used: (200, 2000, 3), 
(300,8000,5) and (500,50000,8). For each of these three combinations, 3 choices of b and 2 
choices of p were used: b = l/-v/d, 2/\/(i and 3/Vd, and p = and 0.5. Therefore, a total of 
3 X 3 X 2 = 18 experimental configurations were considered. The number of repetitions for 
each experimental configuration was 1000. For p = 0, the cases b = l/vu, 2/va and 3/yd 
correspond to the cases when the signal-to-noise ratios are 1, 2 and 3 respectively. 

For each generated data set, we applied the proposed generalized fiducial procedure de- 
scribed in Section [2.21 to obtain a fiducial sample of size 10,000 for {M, a, /3}, and from this we 
computed the generalized fiducial estimate for a"^. We also obtained two other estim ates for 
o"^: the first one from the refitted cross-validation (RCV) method of 



Fan et al 



(|2012l ). while 



the second one is the classical maximum likelihood estimate for o"^ obtained from the true 
model. Of course the last estimate cannot be obtained in practice, but it is computed here 
for benchmark comparisons. In sequel it is termed as the oracle estimate. Also, for RCV, the 
particular version we compared with is RCV-LASSO. 

10 



The bias of these three estimates for o"^ are summarized in Table [TJ From this table one 
can see that the bias of the fiducial estimates are usually not much larger than the bias from 
the oracle estimates. The RCV estimates sometimes have very large bias. 





(n, p, d) = (200, 2000, 3) (n, p, d) = (300, 8000, 5) (n, p, d) = (500, 50000, 8) 


6=1/^3 
p = 


proposed 
RCV 
oracle 


-0.180 (0.323) -0.166 (0.271) 0.230 (0.219) 
1.507 (0.488) -16.749 (0.330) -27.287 (0.221) 
-0.018 (0.317) -0.115 (0.263) -0.031 (0.200) 


6 = 2/v^ 
p = 


proposed 
RCV 
oracle 


-0.511 (0.327) -0.455 (0.259) -0.089 (0.202) 
-0.297 (0.465) -7.932 (0.353) -13.909 (0.255) 
-0.383 (0.321) -0.474 (0.260) -0.151 (0.200) 


6 = 3/^3 
p = 


proposed 
RCV 
oracle 


-0.457 (0.332) -0.112 (0.256) 0.103 (0.203) 
-0.495 (0.451) -4.303 (0.362) -7.245 (0.286) 
-0.316 (0.328) -0.283 (0.254) -0.021 (0.201) 


6 = 1/^3 
p = 0.5 


proposed 
RCV 
oracle 


0.352 (0.335) 0.271 (0.285) 1.046 (0.227) 
0.455 (0.467) -10.333 (0.334) -17.287 (0.247) 
0.367 (0.329) -0.548 (0.258) -0.406 (0.205) 


6 = 2/^3 
p = 0.5 


proposed 
RCV 
oracle 


-0.505 (0.328) -0.092 (0.263) -0.302 (0.199) 
-0.533 (0.442) -3.046 (0.357) -6.73 (0.257) 
-0.103 (0.325) -0.160 (0.261) -0.483 (0.198) 


b = 3/V3 
p = 0.5 


proposed 
RCV 
oracle 


-1.585 (0.304) 0.135 (0.259) -0.080 (0.198) 
-1.404 (0.430) -2.275 (0.342) -3.279 (0.274) 
-1.251 (0.302) -0.188 (0.258) -0.355 (0.197) 



Table 1: Bias of the various estimates of cj^. Numbers in parentheses are standard errors, 
reported in %. 

We also obtained two sets of 90%, 95% and 99% confidence intervals for cj^ from each 
simulated data set. The first set was computed using the proposed generalized fiducial method, 
and the second was calculated by applying classical theory to the true model. Again, the last 
method cannot be used in practice, and is used for benchmark comparisons; i.e., the oracle 
method. The empirical coverage rates of these confidence intervals are summarized in Table [21 
It can be seen that the generalized fiducial confidence intervals are nearly as good as the oracle 
confidence intervals. 

Lastly, for each simulated data set we applied three methods to compute the confidence 
intervals for the regression coefficients /3j's and the mean function E{Yi\xi) evaluated at 50 



randomly selected design po ints a:,:'s. T 
method, the RCV method of 



re th ree methods are the proposed generalized fiducial 



Fan et al\ (|2012l ). and the oracle method that uses the true model. 
As before the empirical coverage rates of these confidence intervals are calculated and they are 
reported in Tables [3] and [H Note that only the confidence intervals for /3i are reported, as the 



11 







90% 


95% 


99% 


{n,p,d) = (200,2000,3) 


6 = 1/^5 
p = 


proposed 
oracle 


0.895 (0.338) 
0.896 (0.336) 


0.949 (0.405) 
0.948 (0.402) 


0.985 (0.537) 
0.985 (0.534) 


p = 


proposed 
oracle 


0.892 (0.337) 
0.892 (0.335) 


0.937 (0.404) 
0.941 (0.401) 


0.987 (0.535) 
0.988 (0.532) 


6 = 3/^5 
p = 


proposed 
oracle 


0.884 (0.338) 
0.886 (0.335) 


0.941 (0.404) 
0.943 (0.401) 


0.986 (0.536) 
0.986 (0.533) 


b = 1/V5 
p = 0.5 


proposed 
oracle 


0.895 (0.344) 
0.896 (0.338) 


0.945 (0.412) 
0.946 (0.404) 


0.988 (0.547) 
0.988 (0.536) 


6 = 2/^5 
p = 0.5 


proposed 
oracle 


0.889 (0.339) 
0.891 (0.336) 


0.939 (0.405) 
0.94 (0.402) 


0.991 (0.538) 
0.991 (0.534) 


6 = 3/^5 
p = 0.5 


proposed 
oracle 


0.906 (0.335) 
0.908 (0.332) 


0.955 (0.401) 
0.957 (0.397) 


0.993 (0.532) 
0.992 (0.528) 


{n,p,d) = (300,8000,5) 


b = l/V5 
p = 


proposed 
oracle 


0.891 (0.277) 
0.898 (0.273) 


0.948 (0.331) 
0.948 (0.326) 


0.985 (0.438) 
0.987 (0.432) 


6 = 2/^5 
p = 


proposed 
oracle 


0.909 (0.275) 
0.904 (0.272) 


0.951 (0.328) 
0.95 (0.325) 


0.987 (0.434) 
0.985 (0.43) 


b = 3/V5 
p = 


proposed 
oracle 


0.913 (0.274) 
0.907 (0.273) 


0.953 (0.328) 
0.955 (0.326) 


0.993 (0.433) 
0.993 (0.431) 


6 = 1/^5 
p = 0.5 


proposed 
oracle 


0.887 (0.286) 
0.898 (0.272) 


0.936 (0.342) 
0.948 (0.325) 


0.984 (0.453) 
0.992 (0.43) 


6 = 2/^5 
p = 0.5 


proposed 
oracle 


0.894 (0.275) 
0.893 (0.273) 


0.947 (0.328) 
0.946 (0.326) 


0.99 (0.434) 
0.992 (0.432) 


6 = 3/^5 
p = 0.5 


proposed 
oracle 


0.906 (0.274) 
0.906 (0.273) 


0.954 (0.328) 
0.952 (0.326) 


0.99 (0.433) 
0.99 (0.432) 


{n,p,d) = (500,50000,8) 


6 = 1/^5 
p = 


proposed 
oracle 


0.88 (0.215) 
0.909 (0.211) 


0.939 (0.257) 
0.952 (0.252) 


0.989 (0.339) 
0.99 (0.332) 


b = 2/V5 
p = 


proposed 
oracle 


0.898 (0.212) 
0.899 (0.211) 


0.942 (0.253) 
0.942 (0.251) 


0.991 (0.333) 
0.991 (0.332) 


6 = 3/^5 
p = 


proposed 
oracle 


0.901 (0.212) 
0.9 (0.211) 


0.952 (0.253) 
0.953 (0.252) 


0.991 (0.333) 
0.992 (0.332) 


6 = 1/^5 
p = 0.5 


proposed 
oracle 


0.865 (0.224) 
0.9 (0.21) 


0.935 (0.267) 
0.94 (0.251) 


0.985 (0.352) 
0.99 (0.331) 


b = 2/V5 
p = 0.5 


proposed 
oracle 


0.895 (0.211) 
0.895 (0.21) 


0.95 (0.252) 
0.949 (0.251) 


0.993 (0.332) 
0.992 (0.331) 


6 = 3/^5 
p = 0.5 


proposed 
oracle 


0.905 (0.211) 
0.903 (0.21) 


0.947 (0.251) 
0.945 (0.251) 


0.989 (0.331) 
0.99 (0.331) 



Table 2: Empirical coverage rates for various confidence intervals for o"^. Numbers in paren- 
theses are averaged widths of the confidence intervals. 



confidence intervals for other /3j's have similar coverage rates. Overall one can see that the 
generalized fiducial method gave quite reliable results, except for a few experimental settings 
where the confidence intervals were over-liberal. 

In an attempt to produce a single summary statistic for comparing the empirical cov- 
erage rates of the confidence intervals produced by different methods, the following calcu- 
lation has been done. For all the 90% generalized fiducial confidence intervals for /3i, we 
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counted the number of times that their empirical coverage rates are within the range (1 — a) it 
1.96Y^a(l — a)/A'sim, where a = 0.10 and A'sim = 1000 is the number of repetitions performed 
for each experimental setting. Similar calculations were then performed for the 95% and 99% 
(i.e., a = 0.05 and a = 0.01) confidence intervals. And it turns out that, for the proposed 
generalized fiducial method, out of the 54 empirical coverage rates, 33 of them are within their 
corresponding target ranges. We have also done the same calculations for the RCV and the 
oracle methods, and the numbers of their empirical coverage rates that are inside their target 
ranges are, respectively, 17 and 50. Lastly, we repeated the same calculations for the empirical 
coverage rates for E{Yi\xi), and the corresponding numbers for the proposed, RCV and oracle 
methods are, respectively, 44, 23 and 54. Of course, these numbers are not perfect for judg- 
ing the relative merits of the different methods, but they seem to suggest that the proposed 
generalized fiducial method provides improvement over the RCV method. 

4.2 Real Data Example: Housing Price Appreciation 

This section analyses a data set that contains 119 months of housing price appreciation (HPA) 
of the national house price index (HPI) for 381 core-based statistical areas (CBSAs) in the 
united states. Here HPA is defined as the percentage of monthly change in log-HPI for each 
of the 381 CBSAs. The goal of the analysis is to predict future HPA values for these CBSAs 



using existi ng data. 



example by 



Fan et al. 



'his data set was recorded from 1996 to 2005, and has been studied for 



20JJ). 



Of course, hous e prices depend o n geographical locations and various macroeconomic fac- 



tors. As argued by 



Fan et al. 



(J2012l ). effects from macroeconomic factors can be well summa- 
rized by the national HPA. Let Xtj be the HPA of the j-th CBSA in month t, and ATj^n be 
the national HPA of month t. Then for any k = 1,...,381, a reasonable model for a 1-year 
ahead HPA prediction for the A;-th CBSA is 

381 

where /3- s and fi\^ are model parameters and et-i is an independent random error. Given 
the national HPA Xt-i^^, it is reasonable to assume that areas that are far away would have 
minimal influence on the local house prices, therefore one can assume the fi- s are sparse. 
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90% 95% 99% 


{n,p,d) = (200,2000,3) 


p = 


proposed 
RCV 
oracle 


0.888 (0.236) 0.946 (0.283) 0.987 (0.377) 
0.869 (0.250) 0.915 (0.298) 0.956 (0.392) 
0.897 (0.235) 0.946 (0.279) 0.988 (0.367) 


p = 


proposed 
RCV 
oracle 


0.884 (0.235) 0.948 (0.282) 0.991 (0.376) 
0.887 (0.238) 0.945 (0.284) 0.988 (0.373) 
0.889 (0.234) 0.946 (0.279) 0.990 (0.367) 


b = 3/V3 
p = 


proposed 
RCV 
oracle 


0.892 (0.236) 0.947 (0.282) 0.987 (0.376) 
0.896 (0.238) 0.95 (0.284) 0.99 (0.373) 
0.897 (0.234) 0.952 (0.279) 0.987 (0.367) 


b = 1/v^ 
p = 0.5 


proposed 
RCV 
oracle 


0.886 (0.282) 0.936 (0.338) 0.985 (0.454) 
0.814 (0.289) 0.849 (0.345) 0.902 (0.453) 
0.894 (0.271) 0.943 (0.323) 0.988 (0.424) 


b = 2/V3 
p = 0.5 


proposed 
RCV 
oracle 


0.898 (0.271) 0.944 (0.325) 0.987 (0.433) 
0.903 (0.274) 0.945 (0.326) 0.988 (0.429) 
0.894 (0.270) 0.949 (0.322) 0.986 (0.423) 


b = 3/VS 
p = 0.5 


proposed 
RCV 
oracle 


0.901 (0.269) 0.948 (0.322) 0.989 (0.429) 
0.899 (0.271) 0.953 (0.323) 0.988 (0.424) 
0.897 (0.269) 0.955 (0.321) 0.99 (0.422) 


in,p,d) = (300,8000,5) 


b = 1/^/5 
p = 


proposed 
RCV 
oracle 


0.810 (0.191) 0.896 (0.229) 0.976 (0.303) 
0.903 (0.204) 0.935 (0.243) 0.956 (0.320) 
0.900 (0.192) 0.948 (0.229) 0.992 (0.301) 


6 = 2/^5 
p = 


proposed 
RCV 
oracle 


0.871 (0.189) 0.936 (0.226) 0.984 (0.300) 
0.897 (0.201) 0.936 (0.239) 0.981 (0.315) 
0.907 (0.191) 0.959 (0.228) 0.989 (0.300) 


b = 3/V5 
p = 


proposed 
RCV 
oracle 


0.888 (0.19) 0.934 (0.227) 0.984 (0.301) 
0.900 (0.197) 0.945 (0.235) 0.979 (0.309) 
0.879 (0.192) 0.941 (0.228) 0.991 (0.300) 


6 = 1/^5 
p = 0.5 


proposed 
RCV 
oracle 


0.812 (0.269) 0.887 (0.322) 0.963 (0.427) 
0.871 (0.236) 0.915 (0.281) 0.960 (0.369) 
0.912 (0.221) 0.954 (0.264) 0.992 (0.346) 


6 = 2/^5 
p = 0.5 


proposed 
RCV 
oracle 


0.895 (0.250) 0.949 (0.299) 0.989 (0.396) 
0.864 (0.224) 0.922 (0.266) 0.975 (0.350) 
0.891 (0.222) 0.950 (0.264) 0.991 (0.347) 


6 = 3/^5 
p = 0.5 


proposed 
RCV 
oracle 


0.908 (0.250) 0.950 (0.299) 0.990 (0.397) 
0.852 (0.220) 0.917 (0.262) 0.975 (0.344) 
0.904 (0.222) 0.949 (0.264) 0.983 (0.347) 


{n,p,d) = (500,50000,8) 


b = i/Vs 

p = 


proposed 
RCV 
oracle 


0.781 (0.148) 0.875 (0.177) 0.978 (0.233) 
0.813 (0.151) 0.857 (0.180) 0.884 (0.237) 
0.910 (0.149) 0.954 (0.177) 0.993 (0.233) 


b = 2/Vs 
p = 


proposed 
RCV 
oracle 


0.853 (0.147) 0.919 (0.176) 0.980 (0.232) 
0.804 (0.156) 0.878 (0.186) 0.965 (0.244) 
0.902 (0.148) 0.947 (0.177) 0.988 (0.232) 


6 = 3/^8 
p = 


proposed 
RCV 
oracle 


0.873 (0.147) 0.925 (0.176) 0.986 (0.232) 
0.841 (0.155) 0.911 (0.184) 0.981 (0.242) 
0.897 (0.149) 0.944 (0.177) 0.988 (0.233) 


b = l/^/S 
p = 0.5 


proposed 
RCV 
oracle 


0.820 (0.206) 0.885 (0.246) 0.950 (0.324) 
0.895 (0.179) 0.935 (0.213) 0.965 (0.280) 
0.925 (0.172) 0.965 (0.204) 0.995 (0.269) 


b = 2/V8 
p = 0.5 


proposed 
RCV 
oracle 


0.897 (0.193) 0.949 (0.230) 0.988 (0.304) 
0.861 (0.169) 0.922 (0.202) 0.976 (0.265) 
0.893 (0.171) 0.944 (0.204) 0.989 (0.268) 


6 = 3/^8 
p = 0.5 


proposed 
RCV 
oracle 


0.888 (0.193) 0.945 (0.230) 0.989 (0.304) 
0.840 (0.168) 0.909 (0.201) 0.968 (0.264) 
0.899 (0.171) 0.942 (0.204) 0.987 (0.268) 



Table 3: Empirical coverage rates for the confidence intervals for /3i. The numbers in the 
parentheses are the averaged widths of the corresponding confidence intervals. 

Note that for any given k, we have "p > n", as p = 382 and n = 119. 

For illustrative purposes, we apply the proposed generalized fiducial procedure to the above 
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90% 95% 99% 


{n,p,d) = (200,2000,3) 


p = 


proposed 
RCV 
oracle 


0.899 (0.421) 0.948 (0.511) 0.988 (0.696) 
0.966 (1.160) 0.981 (1.382) 0.993 (1.817) 
0.896 (0.343) 0.947 (0.409) 0.989 (0.538) 


p = 


proposed 
RCV 
oracle 


0.903 (0.424) 0.953 (0.516) 0.990 (0.704) 
0.857 (0.603) 0.910 (0.718) 0.966 (0.944) 
0.888 (0.342) 0.944 (0.408) 0.988 (0.536) 


b = 3/V3 
p = 


proposed 
RCV 
oracle 


0.911 (0.428) 0.956 (0.519) 0.991 (0.709) 
0.931 (0.605) 0.965 (0.720) 0.992 (0.947) 
0.897 (0.343) 0.947 (0.409) 0.987 (0.537) 


b = 1/v^ 
p = 0.5 


proposed 
RCV 
oracle 


0.903 (0.452) 0.948 (0.549) 0.987 (0.748) 
0.925 (1.281) 0.943 (1.526) 0.964 (2.005) 
0.892 (0.344) 0.944 (0.410) 0.987 (0.538) 


b = 2/V3 
p = 0.5 


proposed 
RCV 
oracle 


0.910 (0.444) 0.955 (0.538) 0.990 (0.733) 
0.855 (0.583) 0.907 (0.695) 0.963 (0.914) 
0.896 (0.343) 0.948 (0.408) 0.988 (0.536) 


b = 3/VS 
p = 0.5 


proposed 
RCV 
oracle 


0.913 (0.438) 0.959 (0.532) 0.993 (0.725) 
0.925 (0.492) 0.961 (0.587) 0.993 (0.771) 
0.899 (0.342) 0.947 (0.408) 0.989 (0.536) 


in,p,d) = (300,8000,5) 


b = 1/^/5 
p = 


proposed 
RCV 
oracle 


0.888 (0.444) 0.938 (0.536) 0.981 (0.725) 
0.951 (1.864) 0.973 (2.221) 0.99 (2.919) 
0.898 (0.388) 0.950 (0.462) 0.990 (0.607) 


6 = 2/^5 
p = 


proposed 
RCV 
oracle 


0.909 (0.439) 0.956 (0.531) 0.992 (0.724) 
0.949 (1.291) 0.977 (1.538) 0.995 (2.022) 
0.900 (0.386) 0.949 (0.46) 0.990 (0.605) 


fe = 3/V5 
p = 


proposed 
RCV 
oracle 


0.909 (0.429) 0.957 (0.519) 0.992 (0.708) 
0.942 (0.915) 0.973 (1.090) 0.995 (1.432) 
0.897 (0.387) 0.948 (0.461) 0.990 (0.606) 


6 = 1/^5 
p = 0.5 


proposed 
RCV 
oracle 


0.871 (0.496) 0.925 (0.602) 0.975 (0.820) 
0.953 (1.641) 0.978 (1.956) 0.996 (2.570) 
0.898 (0.387) 0.947 (0.461) 0.988 (0.606) 


6 = 2/^5 
p = 0.5 


proposed 
RCV 
oracle 


0.914 (0.437) 0.962 (0.531) 0.994 (0.728) 
0.947 (0.741) 0.977 (0.883) 0.996 (1.160) 
0.901 (0.387) 0.954 (0.461) 0.991 (0.606) 


6 = 3/^5 
p = 0.5 


proposed 
RCV 
oracle 


0.914 (0.422) 0.960 (0.512) 0.993 (0.701) 
0.914 (0.431) 0.958 (0.514) 0.992 (0.676) 
0.900 (0.388) 0.951 (0.462) 0.991 (0.607) 


{n,p,d) = (500,50000,8) 


b = i/Vs 

p = 


proposed 
RCV 
oracle 


0.841 (0.445) 0.896 (0.534) 0.951 (0.711) 
0.934 (1.889) 0.960 (2.251) 0.983 (2.958) 
0.902 (0.409) 0.953 (0.488) 0.991 (0.641) 


b = 2/Vs 
p = 


proposed 
RCV 
oracle 


0.907 (0.435) 0.955 (0.522) 0.991 (0.697) 
0.951 (1.573) 0.980 (1.874) 0.997 (2.463) 
0.903 (0.409) 0.951 (0.487) 0.990 (0.640) 


6 = 3/^8 
p = 


proposed 
RCV 
oracle 


0.900 (0.429) 0.951 (0.515) 0.990 (0.687) 
0.957 (1.187) 0.983 (1.415) 0.998 (1.860) 
0.898 (0.409) 0.949 (0.488) 0.989 (0.641) 


b = l/^/S 
p = 0.5 


proposed 
RCV 
oracle 


0.829 (0.501) 0.892 (0.601) 0.958 (0.803) 
0.945 (1.713) 0.978 (2.041) 0.996 (2.682) 
0.905 (0.408) 0.951 (0.486) 0.992 (0.639) 


b = 2/V8 
p = 0.5 


proposed 
RCV 
oracle 


0.907 (0.430) 0.956 (0.517) 0.993 (0.693) 
0.951 (0.708) 0.979 (0.844) 0.997 (1.109) 
0.900 (0.408) 0.951 (0.487) 0.992 (0.640) 


6 = 3/^8 
p = 0.5 


proposed 
RCV 
oracle 


0.903 (0.421) 0.953 (0.505) 0.991 (0.675) 
0.900 (0.417) 0.949 (0.497) 0.990 (0.653) 
0.898 (0.409) 0.949 (0.487) 0.990 (0.640) 



Table 4: Empirical coverage rates for the confidence intervals for E{Yi\xi). The numbers in 
the parentheses are the averaged widths of the corresponding confidence intervals. 

model for one of the CBSAs: San Francisco-San Mateo- Redwood. Two fitted models with non- 
negligible fiducial probabilities are returned: with probability 0.335 the housing appreciation 
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of this area depends on itself and its nearby CBSA San Jose-San Francisco- Oakland, while 
with probability about 0.663, it depends only on the CBSA San Jose-San Francisco-Oakland. 
We also obtained estimate for the noise standard deviation a, which can be interpreted 
as a measure of prediction accuracy when forecasting the housing appreciation. Our point 
estimate for a is 0.56 with a 95% confidence as (0.48,0.65). Our point estimate agrees with 



those reported in 



Fan et al. 



(|2012l ). although no confidence intervals are reported there. 



5 Conclusion 

In this paper we studied the issue of uncertainty quantification in the ultrahigh dimensional 
regression problem. We applied the generalized fiducial inference methodology to develop an 
inferential procedure for this problem. Our theoretical results show that estimates obtained 
by this procedure are consistent, while confidence intervals constructed by this procedure are 
asymptotically correct in the frequentist sense. Numerical results from simulation experiments 
confirm with these theoretical findings. To the best of our knowledge, there are very few pub- 
lished papers that are devoted to quantify uncertainties in the ultrahigh dimensional regression 
problem, and hence the current paper is one of the first to provide a systematic treatment to 
this problem. It also opens the possibility for using fiducial and related methods for conducting 
statistical inference for other "large p small n" problems, such as classification and covariance 
matrix estimation. 

A Derivation of ([7D 

This appendix derives the generalized fiducial density ([7]). A major challenge is to obtain a 
computable expression for the Jacobian @. 

First observe that the term J{y, 9) in @ can be further simplified. The product of Jacobian 
matrices in each of the summands of Q simplifies to a matrix containing the d-columns of the 
n X d matrix < -^G ^{y,6) > ^G ^{y,0) and the n — d columns of the identity matrix 
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with columns ii, . . . ,id removed. Thus we have 



J{y,») 



E 



l={ii,...,id) 



det 



{ij^^-'^y'^y Ts'^-'^y" 



(15) 



where for any n x d matrix A, the sub-matrix (A)^ is the d x d matrix containing the rows 
ii,...,id of A. 

Then notice that each of the candidate model is a multiple regression model, with an 
implicit structural equation 

where Y is the observations, Xm is the design matrix for model M, (Bj^f G RI^^I and o" > 
are parameters, and Z is a vector of i.i.d. standard normal random variables. Plugging this 
into (J15p and after some calculations one has 

JMiy,0) = a~^ 



^ \det {y,XM)i\ 



1'={io,-,i\M\) 
l<io<---<«|Af|<" 



Substituting this into ([5]) we have 



r(M) oc ^ |det (y,XM)i| r 



* = («Ov,«|Af|) 
1<J0<---<«|A/|<" 



n — m 



2 J 



] (ttRSSm) 



n — m 
2 



det(Xl,X;.^|-i.-«(*^) 



-Afjl 2e 



(16) 



where RSSa/ denotes the residual sum of squares of model M when the parameters are esti- 
mated using maximum likelihood, and the term q{M) that controls the model dimension is 
given by ([H). 

The expression (J16p has done well in our simulations. However, the need for computing a 
sum of (u,/7_|_i) terms makes it very computationally expensive. To seek for a faster alternative, 
we re-express the response Y for each fixed model as a column vector 

VM = [{XljXM)-^'^Xl,y- (RSSAf)'/'; {/ - XM{XliXM)-^Xl,}y/RS^M]. 



With this the Jacobian (llSp becomes 

dVM 



JUy.o) = Y. 



det 



dy 



{-^G-Miy^f^M,^')] 



-1 



d 



d{y,PM,cr ) 



GMiy^Pu,^^ 



J t 



a~^\det{X'^jXM)\^RSS 



1 

2 

M- 
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The simplification in the previous formula happens because all but the first m + 1 rows of the 
matrix obtained as the product of matrices in the above expression are and we therefore have 
only one non-zero determinant in the sum. This together with the penalty ^ brings us to the 
final generalized fiducial distribution ([7]). 

Notice that both J'^{y,0) and JM{y,d) are of the form Cm(z/)c~^ where CM{y) is a 
specific constant depending only on the observed data. Theref ore the Jacobians can be vi ewed 



as improper Bayesian priors -7r(/3^,/,cr^) oc a^^. As discussed in lBerger and Pericchil (J200ll ) one 
the issues with the use of improper priors in Bayesian model selection is that a selection of 
a constant Cb in the prior Cb(t~'^ is arbitrary. This is not a problem when a posterior with 
respect to one model is considered because the arbitrary constant cancels. However, it becomes 
a problem in model selection as the arbitrary constants Cb infiuence the result making the use 
of improper prior for model selection difficult. Thus a contribution of fiducial inference is the 
choice of a particular constant Cb for each of the model. 

B Proof of Theorem [331 
B.l Lemmas 



First we present three lemmas, where detailed proofs can be found in iLuo and ChenI (J2013l ). 
Lemma IB. II is proved by applying Stirling's formula. Lemma IB. 21 is proved by integration by 
parts and Lemma IB. 31 is proved by applying Lemma IB.2[ 

Lemma B.l. If log j/logp — ?• 5 as p ^- oo, then 

logf^) =jlogp{l-5){l + o{l)). 

Lemma B.2. Let x? be a chi-square random variable with degrees of freedom j. If c ^ oo and 
J/c — )• 0, then 

uniformly over j < J. 

Lemma B.3. Let x? be a chi-square random variable with degrees of freedom j. Let Cj = 
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2j {logp + log(j logp)}. If p —7- oo, then for any J < p, 

J 



S©^'^ 



I > c,) ^ 0. 



B.2 Proof of Theorem [XT] 

T his appendix pr e sents the proof of Theorem I3.1[ Some of the arguments are similar to those 



m 



Luo and ChenI (|2013l ). 

Denote M as the cohection of models for which pT]) holds, i.e., M = {M : \M\ < /c|Mo|} 
for some fixed k. We first prove that max_^ r(M)/r(Mo) — )• 0. WLOG, assume that cx^ = 1. 
Let m = \M\ and ttt-q = \Mq\ whenever there is no ambiguity. Notice that rriQ = o{n) and 
m = o{n). Rewrite 

r(M)/r(Mo)=exp{-ri-r2} 



where 



_ n-m-l / RSSm 
2 \RSSa/o 



m-mo m-mo .tjqc ^,i n-mo\. n-m 
T2 = log n + log(7rRSSMo) + log <^ T ( 1 /T 

- 7 log ( ) + 7 log ( 

We are going to show that the followings hold uniformly for all M: 
^^ ^ Am(1+Op(1)) if ^^ ^ ^^ 

T2 > — |mo log n — jniQ logp if Mq {Z! M, 



(17) 



Ti > -{m - mo)(l + 6) logp(l + Op(l)) if Mq C M, 

(18) 

Ta = |(m - mo) logn(l + Op(l)) + 7(1 - (^)(m - mo) logp(l + o(l)) if Mq C M. 

Case 1: Mq {Z! M. 

Let A^j = {M : \M\ = j,M £ M}. First note that RSSmq = (n - mo) (1 + Op(l)) = 
n(l + Op(l)), 

RSSm - RSSmo = A(M) + 2^'^ {I - Hm) e + e^Hue - e^Hu^e. (19) 
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and e^Hmo^ = "t-o(1 + Op(l)). 



Consider the second term in (fT9l) and denote Zm = fJ- {I — Hm) e/V^Af) ^^ have 



M 



(/ — Hm) e = v ^mZm 



and Za/ ~ A^(0, 1). Let Cj = 2j {logp + log(j logp)}. For simpHcity, denote ci^/i by Cm- Then, 
by Lemma IB. 31 

kniQ 

Cj) 



max \Zm/V^\ > 1 ) < E E ^(^m > 
^ i=i Mj 

fcmo / N fcmo ^ N 



Therefore, |/x-^ (/ — i3"A/) e| < \/Am \ Zm \ < \/Aj\/-^/c^(l + Op(l)) uniformly over 7W. Since 
Cm = O^mologp), and by the identifiabihty condition ([TT]) . mologp = o(Aa/) uniformly over 
A^ s.t. Mo (t M, 

1/1^ (I - //m) e| = Op(AAf). 

Now consider the third terin in (J19p . by Lemma IB. 31 again. 

(X fcmo fcmo ^ N 

maxe^i/Me/cm > l) < EE^^^^^*^^ > ^i) = E (^)^(^l > ^i) ^ O" 
^ ^ i=i A4, i=i ^-^^ 



So e^ Hm^ < Cm(l + Op(l)) and 

e^Hme = Op{Am) 

uniformly over A^ s.t. Mq <^ M. 
Therefore 

RSSm - RSSa/o = A(M)(1 + Op(l)), 

and 

uniformly for all Af G 7W s.t. Mq ^ M. Therefore 

r. = "'^ + °'^» log ( 1 + ^W (, _^ „^(,„1 _ A(M)(1 + o,(l)) 

uniformly for all M G Al s.t. Mq {Z! M. 
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Moreover, 



m — mo 
2 



iog(.RSS„J + iog{r(!i^)/r(!i^)} 



= ^logn(l + Op(l)) + ^ logn(l + o(l)) 

--{m - mo)logn(l + Op(l)). 



Finally, 



T2 = -(m-mo)logn(l + Op(l)) -7log ( ) +7log ( 

3 

> --mologn(l + Op(l)) -7mologp. 

Case 2: Mq C M. 

Let M* = {M e M, Mo C M, M / Mq} and M* = {M, \M\ = j, Mq C M}. First 
notice that RSSmq — RSSm = Xm-mo(-^)> where Xm-moi^) i^ a chi-square random variable 
depending on M with degrees of freedom m — mo- 



Recall Cj = 2j {logp + log(j logp)}, by Lemma [6.31 again. 



(\ kmo-mo / \ 

max max y?(M)/c,- > 1 < V P \ max y?(M) > c,- 

E KV(x?(M)>c,)^0. 
.7=1 ^-^^ 



kniQ—mo 
< 



It implies that 



Note that Cm-mo = ol'^) uniformly, therefore 
n-m-1 / RSSa/ \ ^ ra - m - 1 / Xm-mo(^) \ 

2 "Hrssa/o; 2 °H ^rssmo-x^-^oW; 



-mo 



> -^(1 + 0,(1)) 



> — (?n, — m-o) 



1 + 



RSSa/o - Xm-mo (^) 

L)) 

log{(A;mo — mo) logp} 



logp 
> -(m - mo)(l + 5) logp(l + Op(l)) 
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logp(l + Op(l)) 



uniformly over M*. 

Therefore, we show that 

Ti > -(m - mo)(l + S) logp(l + Op(l)) 

uniformly over M* . 

By Lemma TB. 11 for mo < m < kmo, log (^) = (1 — (5)mlogp(l + o(l)) uniformly over A^*. 
Therefore, 

3 

T2 = -{m- mo)logn(l + Op(l)) +7(1 - 5)(m- mo)logp(l + o(l)) 

uniformly over A^*. 
Finally, 

max r(M)/r(Mo) = max < max exp (— Ti — T2) , max exp (— Ti — T2) 

M^Mo,M&M yMo(lM MqCM 
p 

By (dZ]), maxMo(z:A/ exp (-Ti - r2) -;> since 

min Ti + r2 — )• 00 
Afo0M 

and by (fTHj) . maxjvfocM exp (— Ti — T2) — )■ if 7 > j^ — 2(i-s) ■ -'■* proves that 

max r(M)/r(Mo) 4- 0. 
Moreover, if (1121) holds, 



kniQ 

^ r(M)/r(Mo) < ^ ^r(Af)/r(Mo) < A;mo max |MJ|r(M)/r(Mo) -^ 0. 
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